Visualizing the consequences of the lethal pandemic of this era which forced the social animal human being to dissocialize amidst the society, with the help of available datasets.
To analyze as well as explorate the reach and behavioral variations of corona virus popularly known as covid-19 on human lives, especially in India.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
import datetime
import plotly.express as px
from sklearn.model_selection import train_test_split
from sklearn import metrics
dataset_1=pd.read_csv("C:\\Users\\user\\OneDrive\\Desktop\\4th semester\\EDA\\Project\\covid_19_india (1).csv")
dataset2=pd.read_csv("C:\\Users\\user\\OneDrive\\Desktop\\4th semester\\EDA\\Project\\covid_vaccine_statewise.csv")
dataset_1.head()
| Sno | Date | Time | State/UnionTerritory | ConfirmedIndianNational | ConfirmedForeignNational | Cured | Deaths | Confirmed | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2020-01-30 | 6:00 PM | Kerala | 1 | 0 | 0 | 0 | 1 |
| 1 | 2 | 2020-01-31 | 6:00 PM | Kerala | 1 | 0 | 0 | 0 | 1 |
| 2 | 3 | 2020-02-01 | 6:00 PM | Kerala | 2 | 0 | 0 | 0 | 2 |
| 3 | 4 | 2020-02-02 | 6:00 PM | Kerala | 3 | 0 | 0 | 0 | 3 |
| 4 | 5 | 2020-02-03 | 6:00 PM | Kerala | 3 | 0 | 0 | 0 | 3 |
dataset_1.tail()
| Sno | Date | Time | State/UnionTerritory | ConfirmedIndianNational | ConfirmedForeignNational | Cured | Deaths | Confirmed | |
|---|---|---|---|---|---|---|---|---|---|
| 18105 | 18106 | 2021-08-11 | 8:00 AM | Telangana | - | - | 638410 | 3831 | 650353 |
| 18106 | 18107 | 2021-08-11 | 8:00 AM | Tripura | - | - | 77811 | 773 | 80660 |
| 18107 | 18108 | 2021-08-11 | 8:00 AM | Uttarakhand | - | - | 334650 | 7368 | 342462 |
| 18108 | 18109 | 2021-08-11 | 8:00 AM | Uttar Pradesh | - | - | 1685492 | 22775 | 1708812 |
| 18109 | 18110 | 2021-08-11 | 8:00 AM | West Bengal | - | - | 1506532 | 18252 | 1534999 |
dataset_1.dtypes
Sno int64 Date object Time object State/UnionTerritory object ConfirmedIndianNational object ConfirmedForeignNational object Cured int64 Deaths int64 Confirmed int64 dtype: object
dataset_1.shape
(18110, 9)
dataset_1.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 18110 entries, 0 to 18109 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Sno 18110 non-null int64 1 Date 18110 non-null object 2 Time 18110 non-null object 3 State/UnionTerritory 18110 non-null object 4 ConfirmedIndianNational 18110 non-null object 5 ConfirmedForeignNational 18110 non-null object 6 Cured 18110 non-null int64 7 Deaths 18110 non-null int64 8 Confirmed 18110 non-null int64 dtypes: int64(4), object(5) memory usage: 1.2+ MB
dataset_1.isnull()
| Sno | Date | Time | State/UnionTerritory | ConfirmedIndianNational | ConfirmedForeignNational | Cured | Deaths | Confirmed | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | False | False | False | False | False | False | False | False | False |
| 1 | False | False | False | False | False | False | False | False | False |
| 2 | False | False | False | False | False | False | False | False | False |
| 3 | False | False | False | False | False | False | False | False | False |
| 4 | False | False | False | False | False | False | False | False | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 18105 | False | False | False | False | False | False | False | False | False |
| 18106 | False | False | False | False | False | False | False | False | False |
| 18107 | False | False | False | False | False | False | False | False | False |
| 18108 | False | False | False | False | False | False | False | False | False |
| 18109 | False | False | False | False | False | False | False | False | False |
18110 rows × 9 columns
dataset_1.isnull().sum()
Sno 0 Date 0 Time 0 State/UnionTerritory 0 ConfirmedIndianNational 0 ConfirmedForeignNational 0 Cured 0 Deaths 0 Confirmed 0 dtype: int64
dataset_1.describe()
| Sno | Cured | Deaths | Confirmed | |
|---|---|---|---|---|
| count | 18110.000000 | 1.811000e+04 | 18110.000000 | 1.811000e+04 |
| mean | 9055.500000 | 2.786375e+05 | 4052.402264 | 3.010314e+05 |
| std | 5228.051023 | 6.148909e+05 | 10919.076411 | 6.561489e+05 |
| min | 1.000000 | 0.000000e+00 | 0.000000 | 0.000000e+00 |
| 25% | 4528.250000 | 3.360250e+03 | 32.000000 | 4.376750e+03 |
| 50% | 9055.500000 | 3.336400e+04 | 588.000000 | 3.977350e+04 |
| 75% | 13582.750000 | 2.788698e+05 | 3643.750000 | 3.001498e+05 |
| max | 18110.000000 | 6.159676e+06 | 134201.000000 | 6.363442e+06 |
dataset_1. rename(columns = {'State/UnionTerritory':'State'}, inplace = True)
dataset_1["State"].unique()
array(['Kerala', 'Telengana', 'Delhi', 'Rajasthan', 'Uttar Pradesh',
'Haryana', 'Ladakh', 'Tamil Nadu', 'Karnataka', 'Maharashtra',
'Punjab', 'Jammu and Kashmir', 'Andhra Pradesh', 'Uttarakhand',
'Odisha', 'Puducherry', 'West Bengal', 'Chhattisgarh',
'Chandigarh', 'Gujarat', 'Himachal Pradesh', 'Madhya Pradesh',
'Bihar', 'Manipur', 'Mizoram', 'Andaman and Nicobar Islands',
'Goa', 'Unassigned', 'Assam', 'Jharkhand', 'Arunachal Pradesh',
'Tripura', 'Nagaland', 'Meghalaya',
'Dadra and Nagar Haveli and Daman and Diu',
'Cases being reassigned to states', 'Sikkim', 'Daman & Diu',
'Lakshadweep', 'Telangana', 'Dadra and Nagar Haveli', 'Bihar****',
'Madhya Pradesh***', 'Himanchal Pradesh', 'Karanataka',
'Maharashtra***'], dtype=object)
dataset_1.drop(dataset_1[dataset_1['State']=="Unassigned"].index, inplace = True)
dataset_1.drop(dataset_1[dataset_1['State']=="Cases being reassigned to states"].index, inplace = True)
dataset_1.loc[dataset_1["State"]=="Karanataka", "State"]="Karnataka"
dataset_1.loc[dataset_1["State"]=="Bihar****", "State"]="Bihar"
dataset_1.loc[dataset_1["State"]=="Maharashtra***", "State"]="Maharashtra"
dataset_1.loc[dataset_1["State"]=="Andaman and Nicobar Islands", "State"]="Andaman & Nicobar Island"
dataset_1.loc[dataset_1["State"]=="Dadra and Nagar Haveli", "State"]="Dadara & Nagar Havelli"
dataset_1.loc[dataset_1["State"]=="Dadra and Nagar Haveli and Daman and Diu", "State"]="Dadara & Nagar Havelli"
dataset_1.loc[dataset_1["State"]=="Madhya Pradesh***", "State"]="Madhya Pradesh"
dataset_1.loc[dataset_1["State"]=="Himanchal Pradesh", "State"]="Himachal Pradesh"
dataset_1.loc[dataset_1["State"]=="Telengana", "State"]="Telangana"
dataset_1.loc[dataset_1["State"]=="Jammu and Kashmir", "State"]="Jammu & Kashmir"
dataset_1.loc[dataset_1["State"]=="Ladakh", "State"]="Jammu & Kashmir"
dataset_1.loc[dataset_1["State"]=="Delhi", "State"]="NCT of Delhi"
dataset_1.loc[dataset_1["State"]=="Arunachal Pradesh", "State"]="Arunanchal Pradesh"
In this dataset we found that some of the names of the state is not correct.So, it was corrected using this code.
dataset_1.drop( columns = ['Sno', 'ConfirmedIndianNational', 'ConfirmedForeignNational','Time'],axis=0, inplace = True )
dataset_1.head(10)
| Date | State | Cured | Deaths | Confirmed | |
|---|---|---|---|---|---|
| 0 | 2020-01-30 | Kerala | 0 | 0 | 1 |
| 1 | 2020-01-31 | Kerala | 0 | 0 | 1 |
| 2 | 2020-02-01 | Kerala | 0 | 0 | 2 |
| 3 | 2020-02-02 | Kerala | 0 | 0 | 3 |
| 4 | 2020-02-03 | Kerala | 0 | 0 | 3 |
| 5 | 2020-02-04 | Kerala | 0 | 0 | 3 |
| 6 | 2020-02-05 | Kerala | 0 | 0 | 3 |
| 7 | 2020-02-06 | Kerala | 0 | 0 | 3 |
| 8 | 2020-02-07 | Kerala | 0 | 0 | 3 |
| 9 | 2020-02-08 | Kerala | 0 | 0 | 3 |
dataset_1['Date'] = pd.to_datetime(dataset_1['Date'])
print("Number of days of the data sample:",dataset_1['Date'].max()-dataset_1['Date'].min())
Number of days of the data sample: 559 days 00:00:00
statewise=dataset_1.groupby("State")[["Confirmed","Cured","Deaths"]].sum().reset_index()
statewise
| State | Confirmed | Cured | Deaths | |
|---|---|---|---|---|
| 0 | Andaman & Nicobar Island | 1938498 | 1848286 | 27136 |
| 1 | Andhra Pradesh | 392432753 | 370426530 | 2939367 |
| 2 | Arunanchal Pradesh | 7176907 | 6588149 | 26799 |
| 3 | Assam | 99837011 | 92678680 | 638323 |
| 4 | Bihar | 133662075 | 126525370 | 1112347 |
| 5 | Chandigarh | 10858627 | 10117035 | 147694 |
| 6 | Chhattisgarh | 163776262 | 151609364 | 2063920 |
| 7 | Dadara & Nagar Havelli | 1959354 | 1862102 | 1022 |
| 8 | Daman & Diu | 2 | 0 | 0 |
| 9 | Goa | 28240159 | 26027201 | 447801 |
| 10 | Gujarat | 143420082 | 132487127 | 2219448 |
| 11 | Haryana | 134347285 | 126585342 | 1502799 |
| 12 | Himachal Pradesh | 30237805 | 27701150 | 494855 |
| 13 | Jammu & Kashmir | 62172019 | 57056301 | 885498 |
| 14 | Jharkhand | 62111994 | 58034506 | 748641 |
| 15 | Karnataka | 488855931 | 444665851 | 6089959 |
| 16 | Kerala | 458906023 | 420174235 | 1888177 |
| 17 | Lakshadweep | 915784 | 820925 | 3908 |
| 18 | Madhya Pradesh | 136416921 | 127505732 | 1788258 |
| 19 | Maharashtra | 1127721063 | 1024765950 | 23868185 |
| 20 | Manipur | 12617943 | 11230568 | 173056 |
| 21 | Meghalaya | 7355969 | 6537909 | 101950 |
| 22 | Mizoram | 2984732 | 2384602 | 9791 |
| 23 | NCT of Delhi | 287227765 | 273419887 | 4943294 |
| 24 | Nagaland | 5041742 | 4519526 | 58460 |
| 25 | Odisha | 160130533 | 150923455 | 790814 |
| 26 | Puducherry | 20065891 | 18483117 | 312155 |
| 27 | Punjab | 99949702 | 91458159 | 2785594 |
| 28 | Rajasthan | 162369656 | 150356820 | 1473089 |
| 29 | Sikkim | 3186799 | 2747214 | 53150 |
| 30 | Tamil Nadu | 431928644 | 404095807 | 5916658 |
| 31 | Telangana | 130562647 | 122154512 | 750075 |
| 32 | Tripura | 14050250 | 12976846 | 150342 |
| 33 | Uttar Pradesh | 312625843 | 291479351 | 4143450 |
| 34 | Uttarakhand | 53140414 | 48362741 | 986001 |
| 35 | West Bengal | 263107876 | 247515102 | 3846989 |
We have added more columns like RECOVERY RATE,MORTALITY RATE,ACTIVE CASES to know more accuratelt the condition of the states.
statewise["Recovery Rate"] = statewise["Cured"]*100 / statewise["Confirmed"]
statewise["Mortality Rate"] = statewise["Deaths"]*100 / statewise["Confirmed"]
statewise["Active Cases"]= statewise["Confirmed"]-(statewise["Cured"]+statewise["Deaths"])
statewise.style.background_gradient(cmap='RdBu_r')
| State | Confirmed | Cured | Deaths | Recovery Rate | Mortality Rate | Active Cases | |
|---|---|---|---|---|---|---|---|
| 0 | Andaman & Nicobar Island | 1938498 | 1848286 | 27136 | 95.346294 | 1.399847 | 63076 |
| 1 | Andhra Pradesh | 392432753 | 370426530 | 2939367 | 94.392358 | 0.749012 | 19066856 |
| 2 | Arunanchal Pradesh | 7176907 | 6588149 | 26799 | 91.796494 | 0.373406 | 561959 |
| 3 | Assam | 99837011 | 92678680 | 638323 | 92.829983 | 0.639365 | 6520008 |
| 4 | Bihar | 133662075 | 126525370 | 1112347 | 94.660636 | 0.832208 | 6024358 |
| 5 | Chandigarh | 10858627 | 10117035 | 147694 | 93.170481 | 1.360154 | 593898 |
| 6 | Chhattisgarh | 163776262 | 151609364 | 2063920 | 92.571025 | 1.260207 | 10102978 |
| 7 | Dadara & Nagar Havelli | 1959354 | 1862102 | 1022 | 95.036527 | 0.052160 | 96230 |
| 8 | Daman & Diu | 2 | 0 | 0 | 0.000000 | 0.000000 | 2 |
| 9 | Goa | 28240159 | 26027201 | 447801 | 92.163791 | 1.585689 | 1765157 |
| 10 | Gujarat | 143420082 | 132487127 | 2219448 | 92.376971 | 1.547516 | 8713507 |
| 11 | Haryana | 134347285 | 126585342 | 1502799 | 94.222479 | 1.118593 | 6259144 |
| 12 | Himachal Pradesh | 30237805 | 27701150 | 494855 | 91.610982 | 1.636544 | 2041800 |
| 13 | Jammu & Kashmir | 62172019 | 57056301 | 885498 | 91.771671 | 1.424271 | 4230220 |
| 14 | Jharkhand | 62111994 | 58034506 | 748641 | 93.435265 | 1.205308 | 3328847 |
| 15 | Karnataka | 488855931 | 444665851 | 6089959 | 90.960511 | 1.245757 | 38100121 |
| 16 | Kerala | 458906023 | 420174235 | 1888177 | 91.559974 | 0.411452 | 36843611 |
| 17 | Lakshadweep | 915784 | 820925 | 3908 | 89.641771 | 0.426738 | 90951 |
| 18 | Madhya Pradesh | 136416921 | 127505732 | 1788258 | 93.467681 | 1.310877 | 7122931 |
| 19 | Maharashtra | 1127721063 | 1024765950 | 23868185 | 90.870516 | 2.116497 | 79086928 |
| 20 | Manipur | 12617943 | 11230568 | 173056 | 89.004745 | 1.371507 | 1214319 |
| 21 | Meghalaya | 7355969 | 6537909 | 101950 | 88.878963 | 1.385949 | 716110 |
| 22 | Mizoram | 2984732 | 2384602 | 9791 | 79.893337 | 0.328036 | 590339 |
| 23 | NCT of Delhi | 287227765 | 273419887 | 4943294 | 95.192708 | 1.721036 | 8864584 |
| 24 | Nagaland | 5041742 | 4519526 | 58460 | 89.642151 | 1.159520 | 463756 |
| 25 | Odisha | 160130533 | 150923455 | 790814 | 94.250267 | 0.493856 | 8416264 |
| 26 | Puducherry | 20065891 | 18483117 | 312155 | 92.112117 | 1.555650 | 1270619 |
| 27 | Punjab | 99949702 | 91458159 | 2785594 | 91.504184 | 2.786996 | 5705949 |
| 28 | Rajasthan | 162369656 | 150356820 | 1473089 | 92.601551 | 0.907244 | 10539747 |
| 29 | Sikkim | 3186799 | 2747214 | 53150 | 86.206064 | 1.667818 | 386435 |
| 30 | Tamil Nadu | 431928644 | 404095807 | 5916658 | 93.556149 | 1.369823 | 21916179 |
| 31 | Telangana | 130562647 | 122154512 | 750075 | 93.560076 | 0.574494 | 7658060 |
| 32 | Tripura | 14050250 | 12976846 | 150342 | 92.360250 | 1.070031 | 923062 |
| 33 | Uttar Pradesh | 312625843 | 291479351 | 4143450 | 93.235846 | 1.325370 | 17003042 |
| 34 | Uttarakhand | 53140414 | 48362741 | 986001 | 91.009342 | 1.855464 | 3791672 |
| 35 | West Bengal | 263107876 | 247515102 | 3846989 | 94.073619 | 1.462134 | 11745785 |
statewise1=statewise.copy(deep=True)
fig = px.pie(statewise1, values='Confirmed', names='State',width=800,height=500)
fig.update_layout(title="Confirmed cases in various states",)
statewise1=statewise.copy(deep=True)
#statewise1.loc[statewise1['Cured']< 30000000, 'State'] = 'Other states'
fig = px.pie(statewise1, values='Cured', names='State',width=800,height=500)
fig.update_layout( title="Cured cases in various states",)
statewise1=statewise.copy(deep=True)
fig = px.pie(statewise1, values='Mortality Rate', names='State',width=800,height=500)
fig.update_layout(title="Mortality Rate cases in various states",)
statewise1=statewise.copy(deep=True)
fig = px.pie(statewise1, values='Recovery Rate', names='State',width=800,height=500)
fig.update_layout(title="Recovery Rate cases in various states",)
cases=dataset_1.groupby("Date")[["Cured","Deaths","Confirmed"]].sum().reset_index()
fig=px.bar(cases,x='Date',y=cases.columns[3],)
fig.update_layout(
title="Total Confirmed cases vs Time",
xaxis_title="Time Period",
yaxis_title="Cases",
legend_title="Cases",
font=dict(size=14)
)
fig.layout.template = 'presentation'
fig.show()
fig=px.line(cases,x='Date',y=cases.columns[1:3],)
fig.update_layout(
title="Total Cured cases &deaths vs Time",
xaxis_title="Time Period",
yaxis_title="Cases",
legend_title="Cases",
font=dict(size=14)
)
fig.layout.template = 'presentation'
fig.show()
dataset_1['Date']= pd.to_datetime(dataset_1['Date'])
data_of_20 = dataset_1.loc[dataset_1.Date.dt.year==2020]
data_of_21 = dataset_1.loc[dataset_1.Date.dt.year==2021]
data_of_20['Month']=data_of_20['Date'].dt.month
data_of_21['Month']=data_of_21['Date'].dt.month
data20= data_of_20.groupby('Month')[['Confirmed','Deaths','Cured']].sum()
data21= data_of_21.groupby('Month')[['Confirmed','Deaths','Cured']].sum()
data20.index=pd.to_datetime( data20.index , format = '%m').strftime( '%B' )
data21.index=pd.to_datetime( data21.index , format = '%m').strftime( '%B' )
C:\Users\user\AppData\Local\Temp\ipykernel_9616\1112686033.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\user\AppData\Local\Temp\ipykernel_9616\1112686033.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
fig=px.bar(data20,x=data20.index,y=data20.columns[0],)
fig.update_layout(
title="Total Confirmed over the months in 2020",
xaxis_title="Months",
yaxis_title="Cases/deaths",
legend_title="Types",
font=dict(size=14)
)
#fig.update_traces(mode='markers+lines')
fig.layout.template = 'presentation'
fig.show()
fig=px.line(data20,x=data20.index,y=data20.columns[1:3],)
fig.update_layout(
title="Total Cured Cases vs Total Deaths over the months in 2020",
xaxis_title="Months",
yaxis_title="Cases/deaths",
legend_title="Types",
font=dict(size=14)
)
fig.update_traces(mode='markers+lines')
fig.layout.template = 'presentation'
fig.show()
fig=px.line(data21,x=data21.index,y=data21.columns[0:3],)
fig.update_layout(
title="Total Cured Cases -Total Deaths- Total Confirmed in 2021",
xaxis_title="Months",
yaxis_title="Number",
legend_title="Types",
font=dict(size=14)
)
fig.update_traces(mode='markers+lines')
fig.layout.template = 'presentation'
fig.show()
fig=px.line(dataset_1[dataset_1['State'].isin(['Bihar', 'West Bengal', 'Jharkhand'])],x='Date',y='Deaths',color='State')
fig.update_layout(
title="Trend of Deaths Cases in states",
xaxis_title="Time Period",
yaxis_title="Cases",
legend_title="State",
font=dict(size=14)
)
fig.layout.template = 'presentation'
fig.show()
fig=px.line(dataset_1[dataset_1['State'].isin(['Maharashtra', 'Karnataka', 'Kerala', 'Tamil Nadu', 'Andhra Pradesh'])],x='Date',y='Deaths',color='State')
fig.update_layout(
title="Trend of Deaths Cases in states",
xaxis_title="Time Period",
yaxis_title="Cases",
legend_title="State",
font=dict(size=14)
)
fig.layout.template = 'presentation'
fig.show()
corrMatrix=dataset_1.corr()
print(corrMatrix)
sns.heatmap(corrMatrix, annot = True, cmap= 'coolwarm')
Cured Deaths Confirmed Cured 1.000000 0.917492 0.997749 Deaths 0.917492 1.000000 0.918308 Confirmed 0.997749 0.918308 1.000000
<AxesSubplot:>
plt.figure(figsize=(15,15))
sns.heatmap(dataset_1.corr(), color='b', annot=True)
<AxesSubplot:>
dataset_1.plot(kind = 'scatter',x= 'Confirmed', y='Cured', alpha= 0.45,
s=dataset_1['Deaths']/10000,c= 'Confirmed', cmap = 'jet',
label='Scatter Plot',title ='Graphical Geographical Data',figsize= (15,10));
X = np.arange(60)
X = X.reshape(-1,1)
y = dataset_1.iloc[:,-1].values.astype(float)
y
array([1.000000e+00, 1.000000e+00, 2.000000e+00, ..., 3.424620e+05,
1.708812e+06, 1.534999e+06])
y = np.diff(y)
y = y.reshape(-1,1)
y
array([[ 0.00000e+00],
[ 1.00000e+00],
[ 1.00000e+00],
...,
[ 2.61802e+05],
[ 1.36635e+06],
[-1.73813e+05]])
x=dataset_1['Confirmed']
y=dataset_1['Cured']
plt.plot(x, y, 'o', color='green')
m, b = np.polyfit(x, y, 1)
plt.plot(x, m*x+b, color='blue')
[<matplotlib.lines.Line2D at 0x18ae2e8d2d0>]
x=dataset_1['Confirmed']
y=dataset_1['Deaths']
plt.plot(x, y, 'o', color='blue')
m, b = np.polyfit(x, y, 1)
plt.plot(x, m*x+b, color='red')
[<matplotlib.lines.Line2D at 0x18ae43a4910>]
x=dataset_1['Deaths']
y=dataset_1['Cured']
plt.plot(x, y, 'o', color='yellow')
m, b = np.polyfit(x, y, 1)
#use red as color for regression line
plt.plot(x, m*x+b, color='black')
[<matplotlib.lines.Line2D at 0x18ae441c250>]
from scipy import stats
z= np.abs(stats.zscore(dataset_1['Confirmed']))
print(z)
df1=dataset_1['Confirmed']
0 0.459730
1 0.459730
2 0.459729
3 0.459727
4 0.459727
...
18105 0.530088
18106 0.336969
18107 0.061486
18108 2.141033
18109 1.876494
Name: Confirmed, Length: 18047, dtype: float64
df_outlier=df1[(z<3)]
df_outlier
0 1
1 1
2 2
3 3
4 3
...
18105 650353
18106 80660
18107 342462
18108 1708812
18109 1534999
Name: Confirmed, Length: 17665, dtype: int64
q1=dataset_1.quantile(0.25)
q2=dataset_1.quantile(0.75)
q3=q2-q1
q3
Cured 277287.0 Deaths 3635.5 Confirmed 296893.5 dtype: float64
from scipy import stats
z= np.abs(stats.zscore(dataset_1['Deaths']))
print(z)
df2=dataset_1['Deaths']
0 0.371877
1 0.371877
2 0.371877
3 0.371877
4 0.371877
...
18105 0.021540
18106 0.301188
18107 0.301911
18108 1.710849
18109 1.297230
Name: Deaths, Length: 18047, dtype: float64
df_outlier2=df2[(z<3)]
df_outlier2
0 0
1 0
2 0
3 0
4 0
...
18105 3831
18106 773
18107 7368
18108 22775
18109 18252
Name: Deaths, Length: 17733, dtype: int64
import seaborn as sns;sns.set(style='whitegrid')
%matplotlib inline
from numpy.linalg import pinv,inv
import matplotlib.image as mpimg
import gc
from pandas.plotting import scatter_matrix
import plotly.express as px
df_district = pd.read_csv('C:\\Users\\user\\OneDrive\\Desktop\\4th semester\\EDA\\Project\\district_level_latest.csv')
df_district.head()
| state | state code | district | confirmed | active | deaths | recovered | delta_confirmed | delta_deceased | delta_recovered | notes | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Andaman and Nicobar Islands | AN | Nicobars | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NaN |
| 1 | Andaman and Nicobar Islands | AN | North and Middle Andaman | 1 | 0 | 0 | 1 | 0 | 0 | 0 | NaN |
| 2 | Andaman and Nicobar Islands | AN | South Andaman | 32 | 0 | 0 | 32 | 0 | 0 | 0 | NaN |
| 3 | Andhra Pradesh | AP | Anantapur | 122 | 62 | 4 | 56 | 4 | 0 | 4 | NaN |
| 4 | Andhra Pradesh | AP | Chittoor | 165 | 88 | 0 | 77 | 14 | 4 | 0 | NaN |
df_district.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 753 entries, 0 to 752 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 state 753 non-null object 1 state code 753 non-null object 2 district 753 non-null object 3 confirmed 753 non-null int64 4 active 753 non-null int64 5 deaths 753 non-null int64 6 recovered 753 non-null int64 7 delta_confirmed 753 non-null int64 8 delta_deceased 753 non-null int64 9 delta_recovered 753 non-null int64 10 notes 22 non-null object dtypes: int64(7), object(4) memory usage: 64.8+ KB
df_district.describe()
| confirmed | active | deaths | recovered | delta_confirmed | delta_deceased | delta_recovered | |
|---|---|---|---|---|---|---|---|
| count | 753.000000 | 753.000000 | 753.000000 | 753.000000 | 753.000000 | 753.000000 | 753.000000 |
| mean | 109.177955 | 68.407703 | 3.517928 | 37.244356 | 0.217795 | 0.022576 | 0.100930 |
| std | 773.663211 | 573.060737 | 30.265097 | 197.436770 | 2.198973 | 0.350941 | 1.160251 |
| min | 0.000000 | -372.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 50% | 10.000000 | 3.000000 | 0.000000 | 2.000000 | 0.000000 | 0.000000 | 0.000000 |
| 75% | 41.000000 | 19.000000 | 1.000000 | 18.000000 | 0.000000 | 0.000000 | 0.000000 |
| max | 16738.000000 | 13173.000000 | 621.000000 | 3045.000000 | 45.000000 | 8.000000 | 27.000000 |
df_district.duplicated().sum()
0
df_district.dtypes
state object state code object district object confirmed int64 active int64 deaths int64 recovered int64 delta_confirmed int64 delta_deceased int64 delta_recovered int64 notes object dtype: object
df_district.isnull().sum()
state 0 state code 0 district 0 confirmed 0 active 0 deaths 0 recovered 0 delta_confirmed 0 delta_deceased 0 delta_recovered 0 notes 731 dtype: int64
grouped_df_district = df_district[["state","district","confirmed","active","recovered","deaths"]]
grouped_df_district
| state | district | confirmed | active | recovered | deaths | |
|---|---|---|---|---|---|---|
| 0 | Andaman and Nicobar Islands | Nicobars | 0 | 0 | 0 | 0 |
| 1 | Andaman and Nicobar Islands | North and Middle Andaman | 1 | 0 | 1 | 0 |
| 2 | Andaman and Nicobar Islands | South Andaman | 32 | 0 | 32 | 0 |
| 3 | Andhra Pradesh | Anantapur | 122 | 62 | 56 | 4 |
| 4 | Andhra Pradesh | Chittoor | 165 | 88 | 77 | 0 |
| ... | ... | ... | ... | ... | ... | ... |
| 748 | West Bengal | Purba Bardhaman | 10 | 7 | 3 | 0 |
| 749 | West Bengal | Purba Medinipur | 49 | 22 | 26 | 1 |
| 750 | West Bengal | Purulia | 0 | 0 | 0 | 0 |
| 751 | West Bengal | South 24 Parganas | 79 | 50 | 27 | 2 |
| 752 | West Bengal | Uttar Dinajpur | 4 | 4 | 0 | 0 |
753 rows × 6 columns
grouped_df_district1 = df_district[["district","delta_confirmed","delta_deceased","delta_recovered"]]
grouped_df_district1
| district | delta_confirmed | delta_deceased | delta_recovered | |
|---|---|---|---|---|
| 0 | Nicobars | 0 | 0 | 0 |
| 1 | North and Middle Andaman | 0 | 0 | 0 |
| 2 | South Andaman | 0 | 0 | 0 |
| 3 | Anantapur | 4 | 0 | 4 |
| 4 | Chittoor | 14 | 4 | 0 |
| ... | ... | ... | ... | ... |
| 748 | Purba Bardhaman | 0 | 0 | 0 |
| 749 | Purba Medinipur | 0 | 0 | 0 |
| 750 | Purulia | 0 | 0 | 0 |
| 751 | South 24 Parganas | 0 | 0 | 0 |
| 752 | Uttar Dinajpur | 0 | 0 | 0 |
753 rows × 4 columns
grouped_df_district = grouped_df_district.sort_values(by="confirmed",ascending=False)
grouped_df_district = grouped_df_district.reset_index(drop=True)
grouped_df_district
| state | district | confirmed | active | recovered | deaths | |
|---|---|---|---|---|---|---|
| 0 | Maharashtra | Mumbai | 16738 | 13173 | 2944 | 621 |
| 1 | Delhi | Delhi_ | 7682 | 4523 | 3045 | 114 |
| 2 | Gujarat | Ahmedabad | 6910 | 4198 | 2247 | 465 |
| 3 | Tamil Nadu | Chennai | 5637 | 4834 | 758 | 45 |
| 4 | Maharashtra | Pune | 3314 | 1762 | 1377 | 175 |
| ... | ... | ... | ... | ... | ... | ... |
| 748 | Manipur | Pherzawl | 0 | 0 | 0 | 0 |
| 749 | Manipur | Noney | 0 | 0 | 0 | 0 |
| 750 | Manipur | Kangpokpi | 0 | 0 | 0 | 0 |
| 751 | Manipur | Kamjong | 0 | 0 | 0 | 0 |
| 752 | Manipur | Jiribam | 0 | 0 | 0 | 0 |
753 rows × 6 columns
sns.set(rc={'figure.figsize':(15,8)})
sns.barplot(x='district', y='active', data=grouped_df_district.nlargest(10,'active'))
plt.show()
sns.set(rc={'figure.figsize':(15,8)})
sns.barplot(x='district', y='recovered', data=grouped_df_district.nlargest(10,'recovered'))
plt.show()
sns.set(rc={'figure.figsize':(15,8)})
sns.barplot(x='district', y='deaths', data=grouped_df_district.nlargest(10,'deaths'))
plt.show()
sns.set(rc={'figure.figsize':(15,8)})
sns.barplot(x='district', y='delta_confirmed', data=grouped_df_district1.nlargest(10,'delta_confirmed'))
plt.show()
sns.set(rc={'figure.figsize':(15,8)})
sns.barplot(x='district', y='delta_deceased', data=grouped_df_district1.nlargest(10,'delta_deceased'))
plt.show()
sns.set(rc={'figure.figsize':(15,8)})
sns.barplot(x='district', y='delta_recovered', data=grouped_df_district1.nlargest(10,'delta_recovered'))
plt.show()
data = df_district[df_district.sum(axis = 1) > 0]
data = data.groupby(['state'])['deaths'].sum().reset_index()
data_death = data[data['deaths'] > 0]
state_fig = px.bar(data_death, x='state', y='deaths', title='State wise deaths reported of COVID-19 in India', text='deaths')
state_fig.show()
C:\Users\user\AppData\Local\Temp\ipykernel_9616\3214162351.py:1: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction.
data = df_district[df_district.sum(axis = 1) > 0]
data = data.groupby(['state'])['recovered'].sum().reset_index()
data_death = data[data['recovered'] > 0]
state_fig = px.bar(data_death, x='state', y='recovered', title='State wise recovered reported of COVID-19 in India', text='recovered')
state_fig.show()
C:\Users\user\AppData\Local\Temp\ipykernel_9616\3184094857.py:1: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction.
df_district['active'] = df_district['confirmed'] - df_district['deaths'] - df_district['recovered']
r_data = df_district.groupby(["state"])["deaths", "confirmed", "recovered", "active"].sum().reset_index()
r_data = r_data.sort_values(by='deaths', ascending=False)
r_data = r_data[r_data['deaths']>50]
plt.figure(figsize=(15, 5))
plt.plot(r_data['state'], r_data['deaths'],color='red')
plt.plot(r_data['state'], r_data['confirmed'],color='green')
plt.plot(r_data['state'], r_data['recovered'], color='blue')
plt.plot(r_data['state'], r_data['active'], color='black')
plt.title('Total Deaths(>150), Confirmed, Recovered and Active Cases by Country')
plt.show()
C:\Users\user\AppData\Local\Temp\ipykernel_9616\1183842964.py:3: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.
In this graph red on shows number of deaths,green represents confirmed cases,blue one shows recovered number of people and last black represents the current active cases at that time.
df_district["deaths"] = pd.cut(df_district["deaths"],bins=[0., 1.5, 3.0, 4.5, 6., np.inf],labels=[1, 2, 3, 4, 5])
dataset = pd.read_csv('C:\\Users\\user\\OneDrive\\Desktop\\4th semester\\EDA\\Project\\district_level_latest.csv')
x = dataset.iloc[3:, :-1].values
y = dataset.iloc[4:, 1].values
x_train = df_district.active
y_train = df_district.recovered
x_train.head()
0 0 1 0 2 0 3 62 4 88 Name: active, dtype: int64
plt.scatter(x_train, y_train, color = "red")
plt.title("active VS recovered")
plt.xlabel("Active Case")
plt.ylabel("Recovered")
plt.show()
gb = df_district.groupby('state')
gb.first()
| state code | district | confirmed | active | deaths | recovered | delta_confirmed | delta_deceased | delta_recovered | notes | |
|---|---|---|---|---|---|---|---|---|---|---|
| state | ||||||||||
| Andaman and Nicobar Islands | AN | Nicobars | 0 | 0 | NaN | 0 | 0 | 0 | 0 | None |
| Andhra Pradesh | AP | Anantapur | 122 | 62 | 3 | 56 | 4 | 0 | 4 | None |
| Arunachal Pradesh | AR | Anjaw | 0 | 0 | NaN | 0 | 0 | 0 | 0 | None |
| Assam | AS | Baksa | 0 | 0 | 1 | 0 | 0 | 0 | 0 | Case tranferred from Nagaland |
| Bihar | BR | Araria | 4 | 3 | 1 | 1 | 0 | 0 | 0 | None |
| Chandigarh | CH | Chandigarh | 191 | 151 | 2 | 37 | 0 | 0 | 0 | None |
| Chhattisgarh | CT | Balod | 1 | 1 | NaN | 0 | 0 | 0 | 0 | None |
| Dadra and Nagar Haveli and Daman and Diu | DN | Dadra and Nagar Haveli | 1 | 0 | NaN | 1 | 0 | 0 | 0 | None |
| Delhi | DL | Central Delhi | 184 | 184 | 1 | 0 | 0 | 0 | 0 | None |
| Goa | GA | North Goa | 6 | 0 | NaN | 6 | 0 | 0 | 0 | None |
| Gujarat | GJ | Other State | 1 | 1 | 5 | 0 | 0 | 0 | 0 | None |
| Haryana | HR | Ambala | 42 | 2 | 2 | 38 | 0 | 0 | 0 | Italian tourists who were treated in Haryana |
| Himachal Pradesh | HP | Bilaspur | 4 | 4 | 1 | 0 | 0 | 0 | 0 | Active cases different due to migrated cases |
| Jammu and Kashmir | JK | Anantnag | 145 | 121 | 1 | 23 | 0 | 0 | 0 | None |
| Jharkhand | JH | Bokaro | 10 | 0 | 1 | 9 | 0 | 0 | 0 | None |
| Karnataka | KA | Bagalkote | 69 | 41 | 1 | 27 | 0 | 0 | 0 | One death on 27th Apr is not included as it's ... |
| Kerala | KL | Alappuzha | 5 | 0 | 1 | 5 | 0 | 0 | 0 | Case of Mahe native who expired in Kannur, add... |
| Ladakh | LA | Kargil | 9 | 2 | NaN | 7 | 0 | 0 | 0 | None |
| Lakshadweep | LD | Lakshadweep | 0 | 0 | NaN | 0 | 0 | 0 | 0 | None |
| Madhya Pradesh | MP | Agar Malwa | 13 | 0 | 1 | 12 | 0 | 0 | 0 | MP bulletin dated 28 Apr reduced total cases i... |
| Maharashtra | MH | Ahmednagar | 70 | 32 | 2 | 35 | 0 | 0 | 0 | Reconciled as per MH bulleting 24/04 |
| Manipur | MN | Bishnupur | 0 | 0 | NaN | 0 | 0 | 0 | 0 | None |
| Meghalaya | ML | East Garo Hills | 0 | 0 | 1 | 0 | 0 | 0 | 0 | None |
| Mizoram | MZ | Aizawl | 1 | 0 | NaN | 1 | 0 | 0 | 0 | None |
| Nagaland | NL | Dimapur | 0 | 0 | NaN | 0 | 0 | 0 | 0 | None |
| Odisha | OR | Angul | 15 | 15 | 1 | 0 | 0 | 0 | 0 | Khorda (except Bhubaneswar municipal corporati... |
| Puducherry | PY | Karaikal | 1 | 1 | NaN | 0 | 0 | 0 | 0 | None |
| Punjab | PB | Amritsar | 298 | 260 | 3 | 34 | 0 | 0 | 0 | None |
| Rajasthan | RJ | Ajmer | 242 | 125 | 4 | 112 | 0 | 0 | 2 | Evacuees from other countries; They have been ... |
| Sikkim | SK | East District | 0 | 0 | NaN | 0 | 0 | 0 | 0 | None |
| Tamil Nadu | TN | Airport Quarantine | 9 | 9 | 2 | 0 | 0 | 0 | 0 | None |
| Telangana | TG | Other State | 37 | 37 | 5 | 0 | 0 | 0 | 0 | None |
| Tripura | TR | Dhalai | 152 | 125 | NaN | 27 | 0 | 0 | 0 | None |
| Uttar Pradesh | UP | Agra | 785 | 367 | 5 | 394 | 0 | 0 | 0 | [14th May] <br>\nConfirmed cases for the distr... |
| Uttarakhand | UT | Almora | 2 | 1 | 1 | 1 | 0 | 0 | 0 | None |
| West Bengal | WB | Alipurduar | 0 | 0 | 1 | 0 | 0 | 0 | 0 | None |
gbb = df_district.groupby(['state', 'active'])
gbb.first()
| state code | district | confirmed | deaths | recovered | delta_confirmed | delta_deceased | delta_recovered | notes | ||
|---|---|---|---|---|---|---|---|---|---|---|
| state | active | |||||||||
| Andaman and Nicobar Islands | 0 | AN | Nicobars | 0 | NaN | 0 | 0 | 0 | 0 | None |
| Andhra Pradesh | 3 | AP | Prakasam | 63 | NaN | 60 | 0 | 0 | 0 | None |
| 7 | AP | Vizianagaram | 7 | NaN | 0 | 3 | 0 | 0 | None | |
| 17 | AP | East Godavari | 52 | NaN | 35 | 1 | 0 | 0 | None | |
| 24 | AP | West Godavari | 69 | NaN | 45 | 0 | 0 | 5 | None | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| West Bengal | 50 | WB | South 24 Parganas | 79 | 2 | 27 | 0 | 0 | 0 | None |
| 101 | WB | Hooghly | 135 | 3 | 30 | 0 | 0 | 0 | None | |
| 186 | WB | North 24 Parganas | 317 | 5 | 102 | 0 | 0 | 0 | None | |
| 347 | WB | Howrah | 509 | 5 | 135 | 0 | 0 | 0 | None | |
| 625 | WB | Kolkata | 1157 | 5 | 386 | 0 | 0 | 0 | None |
384 rows × 9 columns
dataset2=pd.read_csv("C:\\Users\\user\\OneDrive\\Desktop\\4th semester\\EDA\\Project\\covid_vaccine_statewise.csv")
dataset2
| Updated On | State | Total Doses Administered | Sessions | Sites | First Dose Administered | Second Dose Administered | Male (Doses Administered) | Female (Doses Administered) | Transgender (Doses Administered) | ... | 18-44 Years (Doses Administered) | 45-60 Years (Doses Administered) | 60+ Years (Doses Administered) | 18-44 Years(Individuals Vaccinated) | 45-60 Years(Individuals Vaccinated) | 60+ Years(Individuals Vaccinated) | Male(Individuals Vaccinated) | Female(Individuals Vaccinated) | Transgender(Individuals Vaccinated) | Total Individuals Vaccinated | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 16/01/2021 | India | 48276.0 | 3455.0 | 2957.0 | 48276.0 | 0.0 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | 23757.0 | 24517.0 | 2.0 | 48276.0 |
| 1 | 17/01/2021 | India | 58604.0 | 8532.0 | 4954.0 | 58604.0 | 0.0 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | 27348.0 | 31252.0 | 4.0 | 58604.0 |
| 2 | 18/01/2021 | India | 99449.0 | 13611.0 | 6583.0 | 99449.0 | 0.0 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | 41361.0 | 58083.0 | 5.0 | 99449.0 |
| 3 | 19/01/2021 | India | 195525.0 | 17855.0 | 7951.0 | 195525.0 | 0.0 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | 81901.0 | 113613.0 | 11.0 | 195525.0 |
| 4 | 20/01/2021 | India | 251280.0 | 25472.0 | 10504.0 | 251280.0 | 0.0 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | 98111.0 | 153145.0 | 24.0 | 251280.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 7840 | 11/08/2021 | West Bengal | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 7841 | 12/08/2021 | West Bengal | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 7842 | 13/08/2021 | West Bengal | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 7843 | 14/08/2021 | West Bengal | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 7844 | 15/08/2021 | West Bengal | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
7845 rows × 24 columns
dataset2.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 7845 entries, 0 to 7844 Data columns (total 24 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Updated On 7845 non-null object 1 State 7845 non-null object 2 Total Doses Administered 7621 non-null float64 3 Sessions 7621 non-null float64 4 Sites 7621 non-null float64 5 First Dose Administered 7621 non-null float64 6 Second Dose Administered 7621 non-null float64 7 Male (Doses Administered) 7461 non-null float64 8 Female (Doses Administered) 7461 non-null float64 9 Transgender (Doses Administered) 7461 non-null float64 10 Covaxin (Doses Administered) 7621 non-null float64 11 CoviShield (Doses Administered) 7621 non-null float64 12 Sputnik V (Doses Administered) 2995 non-null float64 13 AEFI 5438 non-null float64 14 18-44 Years (Doses Administered) 1702 non-null float64 15 45-60 Years (Doses Administered) 1702 non-null float64 16 60+ Years (Doses Administered) 1702 non-null float64 17 18-44 Years(Individuals Vaccinated) 3733 non-null float64 18 45-60 Years(Individuals Vaccinated) 3734 non-null float64 19 60+ Years(Individuals Vaccinated) 3734 non-null float64 20 Male(Individuals Vaccinated) 160 non-null float64 21 Female(Individuals Vaccinated) 160 non-null float64 22 Transgender(Individuals Vaccinated) 160 non-null float64 23 Total Individuals Vaccinated 5919 non-null float64 dtypes: float64(22), object(2) memory usage: 1.4+ MB
dataset2.isnull()
| Updated On | State | Total Doses Administered | Sessions | Sites | First Dose Administered | Second Dose Administered | Male (Doses Administered) | Female (Doses Administered) | Transgender (Doses Administered) | ... | 18-44 Years (Doses Administered) | 45-60 Years (Doses Administered) | 60+ Years (Doses Administered) | 18-44 Years(Individuals Vaccinated) | 45-60 Years(Individuals Vaccinated) | 60+ Years(Individuals Vaccinated) | Male(Individuals Vaccinated) | Female(Individuals Vaccinated) | Transgender(Individuals Vaccinated) | Total Individuals Vaccinated | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | False | False | False | False | False | False | False | True | True | True | ... | True | True | True | True | True | True | False | False | False | False |
| 1 | False | False | False | False | False | False | False | True | True | True | ... | True | True | True | True | True | True | False | False | False | False |
| 2 | False | False | False | False | False | False | False | True | True | True | ... | True | True | True | True | True | True | False | False | False | False |
| 3 | False | False | False | False | False | False | False | True | True | True | ... | True | True | True | True | True | True | False | False | False | False |
| 4 | False | False | False | False | False | False | False | True | True | True | ... | True | True | True | True | True | True | False | False | False | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 7840 | False | False | True | True | True | True | True | True | True | True | ... | True | True | True | True | True | True | True | True | True | True |
| 7841 | False | False | True | True | True | True | True | True | True | True | ... | True | True | True | True | True | True | True | True | True | True |
| 7842 | False | False | True | True | True | True | True | True | True | True | ... | True | True | True | True | True | True | True | True | True | True |
| 7843 | False | False | True | True | True | True | True | True | True | True | ... | True | True | True | True | True | True | True | True | True | True |
| 7844 | False | False | True | True | True | True | True | True | True | True | ... | True | True | True | True | True | True | True | True | True | True |
7845 rows × 24 columns
dataset2.isnull().sum()
Updated On 0 State 0 Total Doses Administered 224 Sessions 224 Sites 224 First Dose Administered 224 Second Dose Administered 224 Male (Doses Administered) 384 Female (Doses Administered) 384 Transgender (Doses Administered) 384 Covaxin (Doses Administered) 224 CoviShield (Doses Administered) 224 Sputnik V (Doses Administered) 4850 AEFI 2407 18-44 Years (Doses Administered) 6143 45-60 Years (Doses Administered) 6143 60+ Years (Doses Administered) 6143 18-44 Years(Individuals Vaccinated) 4112 45-60 Years(Individuals Vaccinated) 4111 60+ Years(Individuals Vaccinated) 4111 Male(Individuals Vaccinated) 7685 Female(Individuals Vaccinated) 7685 Transgender(Individuals Vaccinated) 7685 Total Individuals Vaccinated 1926 dtype: int64
#REMOVING COLUMNS WHICH ARE UNUSEFUL
dataset2=dataset2[dataset2.State!='India']
dataset2=dataset2[dataset2['Total Individuals Vaccinated'].notna()]
dataset2=dataset2.drop(labels=["Transgender(Individuals Vaccinated)", "Female(Individuals Vaccinated)", "Male(Individuals Vaccinated)", "60+ Years (Doses Administered)", "45-60 Years (Doses Administered)","18-44 Years (Doses Administered)"],axis=1)
male_vaccinated = dataset2["Male (Doses Administered)"].sum()
female_vaccinated = dataset2["Female (Doses Administered)"].sum()
male_vaccinated
7135565446.0
female_vaccinated
6318823830.0
fig = px.pie(values=[male_vaccinated ,female_vaccinated], names=["Male Vaccinated","Female Vaccinated"], width=800,height=500)
fig.update_layout(
title="Gender wise vaccination status",
legend_title="Gender",
font=dict(size=14)
)
fig.layout.template = 'presentation'
fig.show()
From the above pie chart represents that 53% of male are vaccinated and 47% female have got vaccinated.
statewise_vaccination=dataset2.groupby("State")[["First Dose Administered","Second Dose Administered"]].sum().reset_index()
statewise_vaccination
| State | First Dose Administered | Second Dose Administered | |
|---|---|---|---|
| 0 | Andaman and Nicobar Islands | 8.083888e+06 | 1141995.0 |
| 1 | Andhra Pradesh | 5.629879e+08 | 160345737.0 |
| 2 | Arunachal Pradesh | 2.099771e+07 | 5752060.0 |
| 3 | Assam | 2.392148e+08 | 57541214.0 |
| 4 | Bihar | 6.589511e+08 | 126284969.0 |
| 5 | Chandigarh | 1.969515e+07 | 4951484.0 |
| 6 | Chhattisgarh | 4.340759e+08 | 80327528.0 |
| 7 | Dadra and Nagar Haveli and Daman and Diu | 1.133797e+07 | 1776446.0 |
| 8 | Delhi | 3.049722e+08 | 84115315.0 |
| 9 | Goa | 3.204142e+07 | 6800934.0 |
| 10 | Gujarat | 1.074926e+09 | 280843871.0 |
| 11 | Haryana | 3.630617e+08 | 65489121.0 |
| 12 | Himachal Pradesh | 1.500760e+08 | 29079195.0 |
| 13 | Jammu and Kashmir | 2.034292e+08 | 39287506.0 |
| 14 | Jharkhand | 2.882814e+08 | 54330129.0 |
| 15 | Karnataka | 8.663366e+08 | 182179479.0 |
| 16 | Kerala | 6.189776e+08 | 144617802.0 |
| 17 | Ladakh | 9.447258e+06 | 2611222.0 |
| 18 | Lakshadweep | 2.120319e+06 | 482625.0 |
| 19 | Madhya Pradesh | 7.697363e+08 | 130607873.0 |
| 20 | Maharashtra | 1.400431e+09 | 301105538.0 |
| 21 | Manipur | 2.659080e+07 | 5747319.0 |
| 22 | Meghalaya | 2.713678e+07 | 5842387.0 |
| 23 | Mizoram | 2.050252e+07 | 4038784.0 |
| 24 | Nagaland | 1.756547e+07 | 4124561.0 |
| 25 | Odisha | 5.087671e+08 | 107810476.0 |
| 26 | Puducherry | 1.773671e+07 | 3270153.0 |
| 27 | Punjab | 2.871185e+08 | 49267328.0 |
| 28 | Rajasthan | 1.138229e+09 | 227002050.0 |
| 29 | Sikkim | 1.608638e+07 | 4112968.0 |
| 30 | Tamil Nadu | 5.429936e+08 | 131822416.0 |
| 31 | Telangana | 3.919721e+08 | 81567248.0 |
| 32 | Tripura | 9.348524e+07 | 33297280.0 |
| 33 | Uttar Pradesh | 1.196438e+09 | 259005880.0 |
| 34 | Uttarakhand | 1.741822e+08 | 46557931.0 |
| 35 | West Bengal | 9.226559e+08 | 256717715.0 |
vaccination=dataset2.pivot_table( index = 'State', values = ['First Dose Administered','Second Dose Administered'], aggfunc = 'sum' ).reset_index()
vaccination.style.background_gradient(cmap='twilight')
| State | First Dose Administered | Second Dose Administered | |
|---|---|---|---|
| 0 | Andaman and Nicobar Islands | 8083888.000000 | 1141995.000000 |
| 1 | Andhra Pradesh | 562987902.000000 | 160345737.000000 |
| 2 | Arunachal Pradesh | 20997713.000000 | 5752060.000000 |
| 3 | Assam | 239214775.000000 | 57541214.000000 |
| 4 | Bihar | 658951108.000000 | 126284969.000000 |
| 5 | Chandigarh | 19695148.000000 | 4951484.000000 |
| 6 | Chhattisgarh | 434075946.000000 | 80327528.000000 |
| 7 | Dadra and Nagar Haveli and Daman and Diu | 11337973.000000 | 1776446.000000 |
| 8 | Delhi | 304972186.000000 | 84115315.000000 |
| 9 | Goa | 32041420.000000 | 6800934.000000 |
| 10 | Gujarat | 1074926034.000000 | 280843871.000000 |
| 11 | Haryana | 363061708.000000 | 65489121.000000 |
| 12 | Himachal Pradesh | 150075973.000000 | 29079195.000000 |
| 13 | Jammu and Kashmir | 203429155.000000 | 39287506.000000 |
| 14 | Jharkhand | 288281448.000000 | 54330129.000000 |
| 15 | Karnataka | 866336587.000000 | 182179479.000000 |
| 16 | Kerala | 618977564.000000 | 144617802.000000 |
| 17 | Ladakh | 9447258.000000 | 2611222.000000 |
| 18 | Lakshadweep | 2120319.000000 | 482625.000000 |
| 19 | Madhya Pradesh | 769736314.000000 | 130607873.000000 |
| 20 | Maharashtra | 1400430993.000000 | 301105538.000000 |
| 21 | Manipur | 26590795.000000 | 5747319.000000 |
| 22 | Meghalaya | 27136779.000000 | 5842387.000000 |
| 23 | Mizoram | 20502516.000000 | 4038784.000000 |
| 24 | Nagaland | 17565474.000000 | 4124561.000000 |
| 25 | Odisha | 508767148.000000 | 107810476.000000 |
| 26 | Puducherry | 17736714.000000 | 3270153.000000 |
| 27 | Punjab | 287118510.000000 | 49267328.000000 |
| 28 | Rajasthan | 1138229441.000000 | 227002050.000000 |
| 29 | Sikkim | 16086375.000000 | 4112968.000000 |
| 30 | Tamil Nadu | 542993553.000000 | 131822416.000000 |
| 31 | Telangana | 391972116.000000 | 81567248.000000 |
| 32 | Tripura | 93485242.000000 | 33297280.000000 |
| 33 | Uttar Pradesh | 1196437796.000000 | 259005880.000000 |
| 34 | Uttarakhand | 174182247.000000 | 46557931.000000 |
| 35 | West Bengal | 922655934.000000 | 256717715.000000 |
vaccination_gender=dataset2.pivot_table( index = 'State', values = ['Male (Doses Administered)','Female (Doses Administered)'], aggfunc = 'sum' ).reset_index()
vaccination_gender.style.background_gradient(cmap='RdBu_r')
| State | Female (Doses Administered) | Male (Doses Administered) | |
|---|---|---|---|
| 0 | Andaman and Nicobar Islands | 3713987.000000 | 4387523.000000 |
| 1 | Andhra Pradesh | 282110452.000000 | 282404176.000000 |
| 2 | Arunachal Pradesh | 9320135.000000 | 11753535.000000 |
| 3 | Assam | 109322106.000000 | 130411866.000000 |
| 4 | Bihar | 311444792.000000 | 349289470.000000 |
| 5 | Chandigarh | 8381208.000000 | 11348264.000000 |
| 6 | Chhattisgarh | 223617997.000000 | 211645756.000000 |
| 7 | Dadra and Nagar Haveli and Daman and Diu | 4309194.000000 | 7048017.000000 |
| 8 | Delhi | 125854689.000000 | 179823973.000000 |
| 9 | Goa | 15687251.000000 | 16425164.000000 |
| 10 | Gujarat | 500611622.000000 | 577514466.000000 |
| 11 | Haryana | 168890962.000000 | 194807605.000000 |
| 12 | Himachal Pradesh | 76280181.000000 | 74186948.000000 |
| 13 | Jammu and Kashmir | 81029066.000000 | 122692650.000000 |
| 14 | Jharkhand | 136860180.000000 | 152254829.000000 |
| 15 | Karnataka | 440647051.000000 | 427750736.000000 |
| 16 | Kerala | 335233013.000000 | 285509478.000000 |
| 17 | Ladakh | 4308849.000000 | 5156509.000000 |
| 18 | Lakshadweep | 909442.000000 | 1215081.000000 |
| 19 | Madhya Pradesh | 351430006.000000 | 420324751.000000 |
| 20 | Maharashtra | 648834014.000000 | 754051696.000000 |
| 21 | Manipur | 11250938.000000 | 15398920.000000 |
| 22 | Meghalaya | 13043905.000000 | 14158601.000000 |
| 23 | Mizoram | 9976825.000000 | 10594236.000000 |
| 24 | Nagaland | 7032326.000000 | 10590660.000000 |
| 25 | Odisha | 242718934.000000 | 267723377.000000 |
| 26 | Puducherry | 8645162.000000 | 9113173.000000 |
| 27 | Punjab | 123351778.000000 | 164160913.000000 |
| 28 | Rajasthan | 547146906.000000 | 593849036.000000 |
| 29 | Sikkim | 7440996.000000 | 8693029.000000 |
| 30 | Tamil Nadu | 256064789.000000 | 287615366.000000 |
| 31 | Telangana | 189034200.000000 | 204262462.000000 |
| 32 | Tripura | 44935091.000000 | 48845211.000000 |
| 33 | Uttar Pradesh | 518774798.000000 | 681560268.000000 |
| 34 | Uttarakhand | 84788817.000000 | 89916330.000000 |
| 35 | West Bengal | 415822168.000000 | 509081371.000000 |
vaccination=dataset2.pivot_table( index = 'State', values = ['Total Individuals Vaccinated'], aggfunc = 'sum' ).sort_values(by = ['Total Individuals Vaccinated'],ascending=False).reset_index()
fig = px.bar( vaccination, x='Total Individuals Vaccinated',y='State', color ='State',width=900, height=550)
fig.update_layout(
title="States with number of vaccinated individuals",
xaxis_title="State",
yaxis_title="Doses",
legend_title="State",
font=dict(
size=14
)
)
fig.layout.template = 'presentation'
fig.show()
This bar chart represents the highest number of people got vaccinated in India according to the highest to lowest number of vaccinated people. It shows that the highest number of vaccination was done in Maharashtra and futhur followed by Uttar Pradesh,Rajasthan,Gujarat and more
# Top 15 states with highest number of vaccination
vaccination1=dataset2.pivot_table( index = 'State', values = ['Total Individuals Vaccinated'], aggfunc = 'sum' ).sort_values(by = ['Total Individuals Vaccinated'],ascending=False).reset_index().head(15)
fig = px.bar( vaccination1, x='State',y='Total Individuals Vaccinated',width=900, height=550)
fig.update_layout(
title="States with number of vaccinated individuals",
xaxis_title="State",
yaxis_title="Doses",
legend_title="State",
font=dict(
size=14
)
)
fig.layout.template = 'presentation'
fig.show()
# Top 15 states with highest number of vaccination gender wise
vaccination2=dataset2.pivot_table( index = 'State', values = ['Male (Doses Administered)'], aggfunc = 'sum' ).reset_index()
fig = px.bar( vaccination2, x='State',y='Male (Doses Administered)',width=900, height=550)
fig.update_layout(
title="States with number of male vaccinated",
xaxis_title="State",
yaxis_title="Doses",
legend_title="State",
font=dict(
size=14
)
)
fig.layout.template = 'presentation'
fig.show()
It shows the number male vaccinated in differnt states.
vaccination3=dataset2.pivot_table( index = 'State', values = ['Female (Doses Administered)'], aggfunc = 'sum' ).reset_index()
fig = px.bar( vaccination3, x='State',y='Female (Doses Administered)',width=900, height=550)
fig.update_layout(
title="States with number of frmale vaccinated",
xaxis_title="State",
yaxis_title="Doses",
legend_title="State",
font=dict(
size=14
)
)
fig.layout.template = 'presentation'
fig.show()
It shows the number female vaccinated in differnt states.
vaccination4=dataset2.pivot_table( index = 'State', values = ['Male (Doses Administered)'], aggfunc = 'sum' ).sort_values(by = ['Male (Doses Administered)'],ascending=False).reset_index().head(15)
fig = px.bar( vaccination4, x='State',y='Male (Doses Administered)',width=900, height=550)
fig.update_layout(
title="Top 15 States with number of Male vaccinated",
xaxis_title="State",
yaxis_title="Doses",
legend_title="State",
font=dict(
size=14
)
)
fig.layout.template = 'presentation'
fig.show()
vaccination5=dataset2.pivot_table( index = 'State', values = ['Female (Doses Administered)'], aggfunc = 'sum' ).sort_values(by = ['Female (Doses Administered)'],ascending=False).reset_index().head(15)
fig = px.bar( vaccination5, x='State',y='Female (Doses Administered)',width=900, height=550)
fig.update_layout(
title="Top 15 States with number of female vaccinated",
xaxis_title="State",
yaxis_title="Doses",
legend_title="State",
font=dict(
size=14
)
)
fig.layout.template = 'presentation'
fig.show()
vaccine6 = dataset2[" Covaxin (Doses Administered)"].sum()
vaccine6
1716565459.0
vaccine7 = dataset2["CoviShield (Doses Administered)"].sum()
vaccine7
14640233875.0
fig = px.pie(values=[vaccine6,vaccine7], names=["Covaxin","Covidshield"],width=800,height=500)
fig.update_layout(
title="Vaccine (Doses Administered)",
legend_title="Vaccine Name",
font=dict(
size=14
)
)
fig.show()
From the above pie chart it represents that the majority of people have take CoviShueld vaccine i.e. 89.5% and only 10.5% people have taken Covaxin.
agewise=dataset2.pivot_table( index = 'State', values = ['18-44 Years(Individuals Vaccinated)','45-60 Years(Individuals Vaccinated)','60+ Years(Individuals Vaccinated)'], aggfunc = 'sum' ).sort_values(by = ['18-44 Years(Individuals Vaccinated)','45-60 Years(Individuals Vaccinated)','60+ Years(Individuals Vaccinated)'],ascending=False).reset_index()
agewise
| State | 18-44 Years(Individuals Vaccinated) | 45-60 Years(Individuals Vaccinated) | 60+ Years(Individuals Vaccinated) | |
|---|---|---|---|---|
| 0 | Uttar Pradesh | 244892552.0 | 488909436.0 | 414090525.0 |
| 1 | Maharashtra | 241658734.0 | 584319250.0 | 530095028.0 |
| 2 | Gujarat | 231453106.0 | 426820821.0 | 376690790.0 |
| 3 | Rajasthan | 181950995.0 | 429163746.0 | 484094715.0 |
| 4 | Madhya Pradesh | 177578823.0 | 296933460.0 | 267982290.0 |
| 5 | West Bengal | 163820159.0 | 373380603.0 | 348339928.0 |
| 6 | Karnataka | 162778610.0 | 353393295.0 | 325517613.0 |
| 7 | Tamil Nadu | 147393019.0 | 217706722.0 | 161943358.0 |
| 8 | Bihar | 145118819.0 | 225690190.0 | 264843188.0 |
| 9 | Andhra Pradesh | 101023557.0 | 254203320.0 | 187877645.0 |
| 10 | Delhi | 90950668.0 | 119317570.0 | 82315060.0 |
| 11 | Haryana | 85237095.0 | 128190677.0 | 139849643.0 |
| 12 | Kerala | 82660559.0 | 220170317.0 | 294584407.0 |
| 13 | Telangana | 77762244.0 | 179203007.0 | 123689812.0 |
| 14 | Odisha | 74107195.0 | 204669841.0 | 210911486.0 |
| 15 | Jharkhand | 63320139.0 | 109207209.0 | 105121398.0 |
| 16 | Assam | 61397631.0 | 107917402.0 | 61222783.0 |
| 17 | Punjab | 59333158.0 | 121378482.0 | 100232366.0 |
| 18 | Chhattisgarh | 41727384.0 | 229332983.0 | 148175206.0 |
| 19 | Jammu and Kashmir | 34068964.0 | 95141956.0 | 66307979.0 |
| 20 | Uttarakhand | 33983867.0 | 68695577.0 | 65624574.0 |
| 21 | Himachal Pradesh | 17412755.0 | 68793840.0 | 59984689.0 |
| 22 | Tripura | 13500603.0 | 48965819.0 | 27726999.0 |
| 23 | Manipur | 9858542.0 | 10185405.0 | 4939201.0 |
| 24 | Meghalaya | 9196351.0 | 11413151.0 | 5268632.0 |
| 25 | Goa | 7514873.0 | 12191774.0 | 11403306.0 |
| 26 | Arunachal Pradesh | 7460518.0 | 8984788.0 | 3566261.0 |
| 27 | Nagaland | 6399551.0 | 6638742.0 | 3655521.0 |
| 28 | Puducherry | 5471417.0 | 6712395.0 | 5125846.0 |
| 29 | Chandigarh | 5287721.0 | 7913481.0 | 5823412.0 |
| 30 | Dadra and Nagar Haveli and Daman and Diu | 5054593.0 | 4062240.0 | 1719728.0 |
| 31 | Mizoram | 4826364.0 | 8863596.0 | 5915492.0 |
| 32 | Sikkim | 3557099.0 | 7119614.0 | 4728829.0 |
| 33 | Ladakh | 3240510.0 | 3040285.0 | 2759736.0 |
| 34 | Andaman and Nicobar Islands | 1223324.0 | 4376537.0 | 2243271.0 |
| 35 | Lakshadweep | 591777.0 | 925093.0 | 528243.0 |
fig=px.scatter(agewise,x='State',y=['18-44 Years(Individuals Vaccinated)','45-60 Years(Individuals Vaccinated)','60+ Years(Individuals Vaccinated)'])
fig.update_layout(
title="Number of doses given to various age groups",
xaxis_title="States",
yaxis_title="Doses",
font=dict(
size=14
)
)
fig.show()
From the above scatter plot we can clearly obsrve that the age between 18-44 years received maximum vaccination and later on follwed by 45-60 and 60+.
totalvaccine=dataset2.pivot_table( index = 'Updated On', values = 'Total Individuals Vaccinated', aggfunc = 'sum' )
fig=px.area(totalvaccine,x=totalvaccine.index,y='Total Individuals Vaccinated')
fig.update_layout(
title="Total no. of individual vaccinated",
xaxis_title="Time Period",
yaxis_title="Doses",
font=dict(
size=14
)
)
fig.layout.template = 'presentation'
fig.show()
dataset2.plot(kind = 'scatter',x= 'Female (Doses Administered)', y='Male (Doses Administered)', alpha= 0.45,
s=dataset2['Total Doses Administered']/1000000,c= 'Total Doses Administered', cmap = 'jet',
label='Population',title ='Graphical Geographical Data',figsize= (15,10));
dataset2=dataset2.drop('Updated On',axis=1)
dataset2=dataset2.drop('State',axis=1);
percent_missing = dataset2.isnull().sum() * 100 / len(dataset2)
missing_value_df1 = pd.DataFrame({'column_name': dataset2.columns,
'percent_missing': percent_missing})
missing_value_df1.sort_values('percent_missing', inplace=True)
missing_value_df1
| column_name | percent_missing | |
|---|---|---|
| Total Doses Administered | Total Doses Administered | 0.000000 |
| Sessions | Sessions | 0.000000 |
| Sites | Sites | 0.000000 |
| First Dose Administered | First Dose Administered | 0.000000 |
| Second Dose Administered | Second Dose Administered | 0.000000 |
| Male (Doses Administered) | Male (Doses Administered) | 0.000000 |
| Female (Doses Administered) | Female (Doses Administered) | 0.000000 |
| Transgender (Doses Administered) | Transgender (Doses Administered) | 0.000000 |
| Covaxin (Doses Administered) | Covaxin (Doses Administered) | 0.000000 |
| CoviShield (Doses Administered) | CoviShield (Doses Administered) | 0.000000 |
| Total Individuals Vaccinated | Total Individuals Vaccinated | 0.000000 |
| AEFI | AEFI | 36.881403 |
| 45-60 Years(Individuals Vaccinated) | 45-60 Years(Individuals Vaccinated) | 36.916131 |
| 60+ Years(Individuals Vaccinated) | 60+ Years(Individuals Vaccinated) | 36.916131 |
| 18-44 Years(Individuals Vaccinated) | 18-44 Years(Individuals Vaccinated) | 36.933495 |
| Sputnik V (Doses Administered) | Sputnik V (Doses Administered) | 78.155930 |
#dropping unuseful columns for regression
dataset2 = dataset2.drop(labels=['AEFI','45-60 Years(Individuals Vaccinated)','60+ Years(Individuals Vaccinated)','18-44 Years(Individuals Vaccinated)','Sputnik V (Doses Administered)'],axis=1)
dataset2.head()
| Total Doses Administered | Sessions | Sites | First Dose Administered | Second Dose Administered | Male (Doses Administered) | Female (Doses Administered) | Transgender (Doses Administered) | Covaxin (Doses Administered) | CoviShield (Doses Administered) | Total Individuals Vaccinated | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 212 | 23.0 | 2.0 | 2.0 | 23.0 | 0.0 | 12.0 | 11.0 | 0.0 | 0.0 | 23.0 | 23.0 |
| 213 | 23.0 | 2.0 | 2.0 | 23.0 | 0.0 | 12.0 | 11.0 | 0.0 | 0.0 | 23.0 | 23.0 |
| 214 | 42.0 | 9.0 | 2.0 | 42.0 | 0.0 | 29.0 | 13.0 | 0.0 | 0.0 | 42.0 | 42.0 |
| 215 | 89.0 | 12.0 | 2.0 | 89.0 | 0.0 | 53.0 | 36.0 | 0.0 | 0.0 | 89.0 | 89.0 |
| 216 | 124.0 | 16.0 | 3.0 | 124.0 | 0.0 | 67.0 | 57.0 | 0.0 | 0.0 | 124.0 | 124.0 |
x_train=dataset2.drop('Total Doses Administered',axis=1)
y_train=dataset2['Total Doses Administered']
x_train.head()
| Sessions | Sites | First Dose Administered | Second Dose Administered | Male (Doses Administered) | Female (Doses Administered) | Transgender (Doses Administered) | Covaxin (Doses Administered) | CoviShield (Doses Administered) | Total Individuals Vaccinated | |
|---|---|---|---|---|---|---|---|---|---|---|
| 212 | 2.0 | 2.0 | 23.0 | 0.0 | 12.0 | 11.0 | 0.0 | 0.0 | 23.0 | 23.0 |
| 213 | 2.0 | 2.0 | 23.0 | 0.0 | 12.0 | 11.0 | 0.0 | 0.0 | 23.0 | 23.0 |
| 214 | 9.0 | 2.0 | 42.0 | 0.0 | 29.0 | 13.0 | 0.0 | 0.0 | 42.0 | 42.0 |
| 215 | 12.0 | 2.0 | 89.0 | 0.0 | 53.0 | 36.0 | 0.0 | 0.0 | 89.0 | 89.0 |
| 216 | 16.0 | 3.0 | 124.0 | 0.0 | 67.0 | 57.0 | 0.0 | 0.0 | 124.0 | 124.0 |
y_train.head()
212 23.0 213 23.0 214 42.0 215 89.0 216 124.0 Name: Total Doses Administered, dtype: float64
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(x_train,y_train)
LinearRegression()
model.score(x_train,y_train)
1.0
lw = 2
plt.figure(figsize=(10, 10))
plt.title("Weights of the model")
plt.plot(dataset2['Total Doses Administered'], color="gold", linewidth=lw, label="data")
plt.plot(model.predict(x_train), color="blue", linestyle="--", label="linear estimate")
plt.xlabel("Features")
plt.ylabel("Values of the weights")
plt.legend(loc="best", prop=dict(size=12))
plt.ylabel("Features")
plt.xlabel("Values of the weights")
plt.legend(loc="upper left")
<matplotlib.legend.Legend at 0x18ae571b790>
A random forest regressor is a meta estimator that fits a number of classifying decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting
from sklearn.ensemble import RandomForestRegressor
model1 = RandomForestRegressor()
model1.fit(x_train,y_train)
RandomForestRegressor()
model1.score(x_train,y_train)
0.999985726126071
lw = 2
plt.figure(figsize=(10, 10))
plt.title("Weights of the model")
plt.plot(dataset2['Total Doses Administered'], color="red", linewidth=lw, label="data")
plt.plot(model1.predict(x_train), color="blue", linestyle="--", label="linear estimate")
plt.xlabel("Features")
plt.ylabel("Values of the weights")
plt.legend(loc="best", prop=dict(size=12))
plt.ylabel("Features")
plt.xlabel("Values of the weights")
plt.legend(loc="upper left")
<matplotlib.legend.Legend at 0x18ae52fae00>
Bayesian regression allows a natural mechanism to survive insufficient data or poorly distributed data by formulating linear regression using probability distributors rather than point estimates. The output or response ‘y’ is assumed to drawn from a probability distribution rather than estimated as a single value.
from sklearn.linear_model import BayesianRidge
model2 = BayesianRidge(compute_score=True)
model2.fit(x_train,y_train)
BayesianRidge(compute_score=True)
model2.score(x_train,y_train)
0.9999999999999992
lw = 2
plt.figure(figsize=(10, 10))
plt.title("Weights of the model")
plt.plot(dataset2['Total Doses Administered'], color="red", linewidth=lw, label="data")
plt.plot(model2.predict(x_train), color="green", linestyle="--", label="linear estimate")
plt.xlabel("Features")
plt.ylabel("Values of the weights")
plt.legend(loc="best", prop=dict(size=12))
plt.ylabel("Features")
plt.xlabel("Values of the weights")
plt.legend(loc="upper left")
<matplotlib.legend.Legend at 0x18ae536b040>
lw = 2
plt.figure(figsize=(10, 10))
plt.title("Weights of the model")
plt.plot(dataset2['Total Doses Administered'], color="red", linewidth=lw, label="data")
plt.plot(model2.predict(x_train), color="green", linestyle="--", label="bayesian estimate")
plt.plot(model1.predict(x_train), color="blue", linestyle="--", label="random estimate")
plt.plot(model.predict(x_train), color="gold", linestyle="--", label="linear estimate")
plt.xlabel("Features")
plt.ylabel("Values of the weights")
plt.legend(loc="best", prop=dict(size=12))
plt.ylabel("Features")
plt.xlabel("Values of the weights")
plt.legend(loc="upper left")
<matplotlib.legend.Legend at 0x18ae05efac0>
plt.figure(figsize=(6, 5))
plt.title("Marginal log-likelihood")
plt.plot(model2.scores_, color="navy", linewidth=lw)
plt.ylabel("Score")
plt.xlabel("Iterations")
Text(0.5, 0, 'Iterations')
The coronavirus disease continues to spread across the world following a trajectory that is difficult to predict. The health, humanitarian and socio-economic policies adopted by countries will determine the speed and strength of the recovery. From the above analysis it is clearly seen that the the covid has hit India in a very disasterous manner and many people died in this and many people got recovered.It is observed that the states having more international contact have suffered alot than any other and in this category Maharastra,Karnataka,Tamil Nadu,Kerala,Gujarat,Delhi etc. Inthis around 5 billion were cured,73 million were dead between the time of our dataset andWith this our vaccination analysis says that our government has done vaccination in a very efficient manner which lead to the vaccination of almost all people i.e. about 14 billion people got vaccination upto September in which 8 billion were male and 6 billion were female.